OS4M: Achieving Global Load Balance of MapReduce Workload by Scheduling at the Operation Level

نویسندگان

  • Liya Fan
  • Bo Gao
  • Fa Zhang
  • Zhiyong Liu
چکیده

The efficiency of MapReduce is closely related to its load balance. Existing works on MapReduce load balance focus on coarse-grained scheduling. This study concerns finegrained scheduling on MapReduce operations, with each operation representing one invocation of the Map or Reduce function. By default, MapReduce adopts the hash-based method to schedule Reduce operations, which often leads to poor load balance. In addition, the copy phase of Reduce tasks overlaps with Map tasks, which significantly hinders the progress of Map tasks due to I/O contention. Moreover, the three phases of Reduce tasks run in sequence, while consuming different resources, thereby under-utilizing resources. To overcome these problems, we introduce a set of mechanisms named OS4M (Operation Scheduling for MapReduce) to improve MapReduce’s performance. OS4M achieves load balance by collecting statistics of all Map operations, and calculates a globally optimal schedule to distribute Reduce operations. With OS4M, the copy phase of Reduce tasks no longer overlaps with Map tasks, and the three phases of Reduce tasks are pipelined based on their operation loads. OS4M has been transparently incorporated into MapReduce. Evaluations on standard benchmarks show that OS4M’s job duration can be shortened by up to 42%, compared with a

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Load Balance of MapReduce Operations based on the Key Distribution of Pairs

Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns finegrained scheduling on MapReduce operations. Each operation represents one invocation of the Map or Reduce function. Scheduling MapReduce operations is difficult due to highly skewed operation loads, no support to collect w...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

ROUTE: run-time robust reducer workload estimation for MapReduce

MapReduce has become a popular model for large-scale data processing in recent years. Many works on MapReduce scheduling (e.g., load balancing and deadline-aware scheduling) have emphasized the importance of predicting workload received by individual reducers. However, because the input characteristics and user-specified map function of a given job are unknown to the MapReduce framework before ...

متن کامل

Parallel Processing Letters Parallel Incremental Scheduling

Parallel incremental scheduling is a new approach for load balancing. In parallel scheduling,all processorscooperatetogether to balance the workload. Parallel scheduling accuratelybalances the load by using global load information. In incremental scheduling, the system scheduling activity alternates with the underlying computation work. This paper provides an overview of parallel incremental sc...

متن کامل

Grid load balancing using intelligent agents

Workload and resource management are essential functionalities in the software infrastructure for grid computing. The management and scheduling of dynamic grid resources in a scalable way requires new technologies to implement a next generation intelligent grid environment. This work demonstrates that AI technologies can be utilised to achieve effective workload and resource management. A combi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1406.3901  شماره 

صفحات  -

تاریخ انتشار 2014